24 research outputs found

    Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

    Get PDF
    We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene

    Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

    Get PDF
    We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants

    Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    Get PDF
    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Get PDF
    publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    Inferring Cell Differentiation Processes Based on Phylogenetic Analysis of Genome-Wide Epigenetic Information: Hematopoiesis as a Model Case

    Get PDF
    How cells divide and differentiate is a fundamental question in organismal development; however, the discovery of differentiation processes invarious cell types is laborious and sometimes impossible. Phylogenetic analysis is typically used to reconstruct evolutionary processes based on inherent characters. It could also be used to reconstruct developmental processes based on the developmental changes that occur during cell proliferation and differentiation. In this study, DNA methylation information from differentiated hematopoietic cells was used to perform phylogenetic analyses. The results were assessed for their validity in inferring hierarchical differentiation processes of hematopoietic cells and DNA methylation processes of differentiating progenitor cells. Overall, phylogenetic analyses based on DNA methylation information facilitated inferences regarding hematopoiesis

    Inferring chromatin accessibility during murine hematopoiesis through phylogenetic analysis

    No full text
    Abstract Objective Diversification of cell types and changes in epigenetic states during cell differentiation processes are important for understanding development. Recently, phylogenetic analysis using DNA methylation and histone modification information has been shown useful for inferring these processes. The purpose of this study was to examine whether chromatin accessibility data can help infer these processes in murine hematopoiesis. Results Chromatin accessibility data could partially infer the hematopoietic differentiation hierarchy. Furthermore, based on the ancestral state estimation of internal nodes, the open/closed chromatin states of differentiating progenitor cells could be predicted with a specificity of 0.86–0.99 and sensitivity of 0.29–0.72. These results suggest that the phylogenetic analysis of chromatin accessibility could offer important information on cell differentiation, particularly for organisms from which progenitor cells are difficult to obtain

    In silico and in vivo identification of the intermediate filament vimentin that is downregulated downstream of Brachyury during Xenopus embryogenesis

    Get PDF
    Brachyury, a member of the T-box transcription family, has been suggested to be essential for morphogenetic movements in various processes of animal development. However, little is known about its critical transcriptional targets. In order to identify targets of Brachyury and understand the molecular mechanisms underlying morphogenetic movements, we first searched the genome sequence of Xenopus tropicalis, the only amphibian genomic sequence available, for Brachyury-binding sequences known as T-half sites, and then screened for the ones conserved between vertebrate genomes. We found three genes that have evolutionarily conserved T-half sites in the promoter regions and examined these genes experimentally to determine whether their expressions were regulated by Brachyury, using the animal cap system of Xenopus laevis embryos. Eventually, we obtained evidence that vimentin, encoding an intermediate filament protein, was a potential target of Brachyury. This is the first report to demonstrate that Brachyury might affect the cytoskeletal structure through regulating the expression of an intermediate filament protein, vimentin

    Interregional Coevolution Analysis Revealing Functional and Structural Interrelatedness between Different Genomic Regions in Human Mastadenovirus D

    Get PDF
    Human mastadenovirus D (HAdV-D) is exceptionally rich in type among the seven human adenovirus species. This feature is attributed to frequent intertypic recombination events that have reshuffled orthologous genomic regions between different HAdV-D types. However, this trend appears to be paradoxical, as it has been demonstrated that the replacement of some of the interacting proteins for a specific function with other orthologues causes malfunction, indicating that intertypic recombination events may be deleterious. In order to understand why the paradoxical trend has been possible in HAdV-D evolution, we conducted an interregional coevolution analysis between different genomic regions of 45 different HAdV-D types and found that ca. 70% of the genome has coevolved, even though these are fragmented into several pieces via short intertypic recombination hot spot regions. Since it is statistically and biologically unlikely that all of the coevolving fragments have synchronously recombined between different genomes, it is probable that these regions have stayed in their original genomes during evolution as a platform for frequent intertypic recombination events in limited regions. It is also unlikely that the same genomic regions have remained almost untouched during frequent recombination events, independently, in all different types, by chance. In addition, the coevolving regions contain the coding regions of physically interacting proteins for important functions. Therefore, the coevolution of these regions should be attributed at least in part to natural selection due to common biological constraints operating on all types, including protein-protein interactions for essential functions. Our results predict additional unknown protein interactions. IMPORTANCE Human mastadenovirus D, an exceptionally type-rich human adenovirus species and causative agent of different diseases in a wide variety of tissues, including that of ocular region and digestive tract, as well as an opportunistic infection in immunocompromised patients, is known to have highly diverged through frequent intertypic recombination events; however, it has also been demonstrated that the replacement of a component protein of a multiprotein system with a homologous protein causes malfunction. The present study solved this apparent paradox by looking at which genomic parts have coevolved using a newly developed method. The results revealed that intertypic recombination events have occurred in limited genomic regions and been avoided in the genomic regions encoding proteins that physically interact for a given function. This approach detects purifying selection against recombination events causing the replacement of partial components of multiprotein systems and therefore predicts physical and functional interactions between different proteins and/or genomic elements
    corecore